Annotating Cell Types

This workbook can be run after the standard workflow. It is demonstrating how to use the annotation function to annotate the dataset that was runned through the standard worflow.

In this notebook, we will show how to use in-besca annotation to assign cell type to clusters. We focus on immune celltype and demonstrate signature-scoring functions.

An alternative in the case a an annotated training dataset already exists is to use the auto-annot module. Please refer to the corresponding tutorial.

In [1]:
import besca as bc
import numpy as np
import pandas as pd
import scanpy.api as sc
import matplotlib.pyplot as plt
from scipy import sparse, io
import os
import time
import logging
import seaborn as sns
sc.logging.print_versions()

# for standard processing, set verbosity to minimum
sc.settings.verbosity = 0  # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.settings.set_figure_params(dpi=80)
version = '2.8'
start0 = time.time()
./.local/lib/python3.7/site-packages/scanpy/api/__init__.py:7: FutureWarning: 

In a future version of Scanpy, `scanpy.api` will be removed.
Simply use `import scanpy as sc` and `import scanpy.external as sce` instead.

  FutureWarning,
./.local/lib/python3.7/site-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
  import pandas.util.testing as tm
scanpy==1.5.1 anndata==0.7.3 umap==0.3.10 numpy==1.17.4 scipy==1.5.1 pandas==1.0.3 scikit-learn==0.21.3 statsmodels==0.10.2 python-igraph==0.8.2 leidenalg==0.8.0
In [4]:
### Plot parameters for publication 
def set_pub():    
    small_size = 10
    medium_size = 12
    large_size = 14

    resolution = 300 #in dpi
    plt.rcParams['font.weight'] = 'normal'
    #plt.rc('font', **{'family':'sans-serif','sans-serif':['Helvetica']})
    plt.rc('axes', titlesize=large_size, titleweight = "bold")               # fontsize of the axes title
    plt.rc('axes', labelsize=medium_size, labelweight = "bold")               # fontsize of the x and y labels
    plt.rc('xtick', labelsize=small_size)               # fontsize of the tick labels
    plt.rc('ytick', labelsize=small_size)               # fontsize of the tick labels
    plt.rc('legend', fontsize=small_size, title_fontsize = medium_size)               # legend fontsize
    plt.rc('figure', titlesize=large_size, titleweight = "bold")              # fontsize of the figure title
    plt.rc('savefig', dpi=resolution)                   # higher res outputs

    plt.rcParams['svg.fonttype'] = 'none'


set_pub()
In [5]:
#define standardized filepaths based on above input
root_path = os.getcwd()
bescapath='./src/bescapub/besca/'

analysis_name = 'StdWf1_PRJCA001063_CRC_besca2'
results_file = os.path.join(results_folder, analysis_name + '.annotated.updated.h5ad')
figdir=os.path.join(root_path, 'analyzed', analysis_name+'/figures/')
sc.settings.figdir = figdir
clusters='leiden'
In [3]:
adata = sc.read_h5ad(os.path.join(results_folder, analysis_name + '.h5ad') )
adata
./.local/lib/python3.7/site-packages/anndata/compat/__init__.py:161: FutureWarning:

Moving element from .uns['neighbors']['distances'] to .obsp['distances'].

This is where adjacency matrices should go now.

./.local/lib/python3.7/site-packages/anndata/compat/__init__.py:161: FutureWarning:

Moving element from .uns['neighbors']['connectivities'] to .obsp['connectivities'].

This is where adjacency matrices should go now.

Out[3]:
AnnData object with n_obs × n_vars = 57423 × 2033
    obs: 'CELL', 'CONDITION', 'Patient', 'Type', 'Cell_type', 'percent_mito', 'n_counts', 'n_genes', 'leiden'
    var: 'ENSEMBL', 'SYMBOL', 'n_cells', 'total_counts', 'frac_reads'
    uns: 'CONDITION_colors', 'Cell_type_colors', 'Patient_colors', 'leiden', 'leiden_colors', 'neighbors', 'pca', 'rank_genes_groups'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    obsp: 'distances', 'connectivities'
In [6]:
sc.pl.umap(adata, color= [clusters], legend_loc='on data', save='_leiden.svg')

Inspect available public annotation

In [7]:
sc.pl.umap(adata, color= ['Cell_type'], save='_origlabel.svg')

If one trusts this annotation, one can obtain top markers per annotation & use to refine the signatures

In [6]:
### Perform DE cells of each Cell_type vs. all other cells
DEgenes=bc.tl.dge.get_de(adata,'Cell_type',demethod='wilcoxon',topnr=5000, logfc=1,padj=0.05)
In [7]:
### Select only top genes (in order of p-val) for 2 cell types and plot expression per cell type
tops=list(DEgenes['Macrophage cell']['Name'][0:15])+list(DEgenes['Stellate cell']['Name'][0:15])
sc.pl.dotplot(adata, var_names=tops,groupby='Cell_type')
Out[7]:
GridSpec(2, 5, height_ratios=[0, 10.5], width_ratios=[10.5, 0, 0.2, 0.5, 0.25])

Inspect various signatures

In [8]:
# One can load besca-provided signatures using the function below
signature_dict = bc.datasets.load_immune_signatures(refined=False)

signature_dict
Out[8]:
{'lymphocyte': ['PTPRC'],
 'myeloid': ['S100A8', 'S100A9', 'CST3'],
 'Bcell': ['CD19', 'CD79A', 'MS4A1'],
 'Tcells': ['CD3E', 'CD3G', 'CD3D'],
 'CD4': ['CD4'],
 'CD8': ['CD8A', 'CD8B'],
 'NKcell': ['NKG7', 'GNLY', 'NCAM1'],
 'monocyte': ['CST3', 'CSF1R', 'ITGAM', 'CD14', 'FCGR3A', 'FCGR3B'],
 'macrophage': ['CD14',
  'IL1B',
  'LYZ    CD163   ITGAX',
  'CD68',
  'CSF1R',
  'FCGR3A']}

Additionaly it is possible to read an compute scanpy score using this function below.

If the gmt file is composed of combined signature (UP and DN), a common score will be computed: $$Total\_SCORE= Score_{UP} - Score_{DN}$$

Signatures for specific sub-populations

In [6]:
 ## PROVIDED WITH BESCA, make a local copy if you'd like to modify/add your own
gmt_file_anno= bescapath + '/besca/datasets/genesets/CellNames_scseqCMs6_sigs.gmt'
#gmt_file_anno= root_path + '/analyzed/'+analysis_name+ 'CellNames_scseqCMs6_sigs.gmt'
bc.tl.sig.combined_signature_score(adata, gmt_file_anno)
In [7]:
scores = [x for x in adata.obs.columns if 'scanpy' in x]
sc.pl.umap(adata, color= scores, color_map = 'viridis')

Signature-based annotation

A decision-tree-based annotation that reads signatures from a provided .gmt file and hierarchy as well as cutoffs and signature ordering from a configuration file and attributes each cell to a specific type according to signature enrichment.

This is an aid to start ther annotation and annotation can then be further refined by adding further signatures or adjusting the configuration files. It was tested mainly on PBMCs and oncology (tumor biopsies) related samples.

In [8]:
 ## PROVIDED WITH BESCA, make a local copy if you'd like to modify/add your own
gmt_file_anno= bescapath + '/besca/datasets/genesets/CellNames_scseqCMs6_sigs.gmt'
#gmt_file_anno= root_path + '/analyzed/'+analysis_name+ 'CellNames_scseqCMs6_sigs.gmt'

mymarkers = bc.tl.sig.read_GMT_sign(gmt_file_anno,directed=False)
mymarkers = bc.tl.sig.filter_siggenes(adata, mymarkers) ### remove genes not present in dataset or empty signatures
mymarkers['Ubi'] = ['B2M','ACTB', 'GAPDH'] ### used for cutoff adjustment to individual dataset, can be modified
In [9]:
sc.pl.umap(adata, color= mymarkers['Fibroblast'])
#sc.pl.umap(adata,color=['COLEC11','DCN'])

We read the configuration file, containing hierarchy, cutoff and signature priority information. A new version of this file should be created and maintained with each annotation. The included example is optimised for the annotation of the 6.6k PBMC dataset.

In [19]:
configfile= bescapath + '/besca/datasets/genesets/CellNames_scseqCMs6_config.tsv'
#configfile=root_path + '/analyzed/'+analysis_name+'/CellNames_scseqCMs6_config.tsv' ### replace this with your config
In [20]:
plt=bc.pl.nomenclature_network(configfile, font_size=9)
In [21]:
sigconfig,levsk=bc.tl.sig.read_annotconfig(configfile)

Fract_pos was exported by BESCA in the standard worflow test, contains information of fraction positive cells per genes per cluster.

We use these values as a basis for a wilcoxon test per signature per cluster.

In [22]:
f=pd.read_csv(results_folder + "/labelings/"+clusters+"/fract_pos.gct",sep="\t",skiprows=2)
df=bc.tl.sig.score_mw(f,mymarkers)
myc=np.median(df.loc['Ubi',:])/2 ### Set a cutoff based on Ubi and scale with values from config file

For each signature, positive and negative clusters are determined. Only positive clusters are maintained. Cutoffs can be individualised based on the config file (scaling factor) and myc, which is determined based on ubiquitously expressed genes.

In [23]:
#Cluster attribution based on cutoff
df=df.drop('Ubi')
sigscores={}
for mysig in list(df.index):
    sigscores[mysig]=bc.tl.sig.getset(df,mysig,sigconfig.loc[mysig,'Cutoff']*myc)
    #sigscores[mysig]=bc.tl.sig.getset(df,mysig,10)
In [24]:
sns.clustermap(df.loc[df.max(axis=1)>myc*2.5,:].astype(float),figsize=(18, 12))
plt.savefig(figdir+"SignatureHeatmap_all.svg", format="svg")
In [25]:
mysig
Out[25]:
'ILC2'

One can inspect the cluster attribution per cell type in the signature list and adjust cutoffs as required.

In [26]:
#sigscores

Now each cluster gets annotated, according to the distinct levels specified in the config file. Note that in case a cluster is positive for multiple identities, only the first one is taken, in the order specified in the "Order" column in the config file.

To check the given order, per levels, you can inspect levsk

In [27]:
#levsk
In [28]:
### One can choose to exclude certain cell types not relevant for the analysed sample 
### or which go too much into detail for the current resolution
toexclude=['Cholangiocyte','Hepatocyte', 'Macrophage_MARCO','Macrophage_MSR1', 'EnterocytePC', 'Pericyte']
In [29]:
cnames=bc.tl.sig.make_anno(df,sigscores,sigconfig,levsk,'celltype',toexclude)

We now obtained per each cluster cell type attribution at distinct levels.

In [30]:
cnames
Out[30]:
celltype0 celltype1 celltype2 celltype3
16 Hematopoietic Myeloid Macrophage Macrophage
31 Epithelial Enteroendocrine BetaPancreatic BetaPancreatic
6 Fibroblast Fibroblast Fibroblast Fibroblast
33 Hematopoietic Blymphocyte Plasma Plasma
25 Hematopoietic Tcell CD8Tcell CytotoxCD8Tcell
38 Endothelial VesselEndothelial VesselEndothelial VesselEndothelial
42 Endothelial VesselEndothelial VesselEndothelial VesselEndothelial
30 Fibroblast Fibroblast Fibroblast Fibroblast
20 Hematopoietic Myeloid cDC cDC2
2 Endothelial VesselEndothelial VesselEndothelial VesselEndothelial
36 Epithelial PancreaticDuctal PancreaticDuctal PancreaticDuctal
34 Epithelial PancreaticDuctal PancreaticDuctal PancreaticDuctal
40 Fibroblast PancStellate PancStellate PancStellate
11 Hematopoietic Tcell CD4Tcell CMCD4Tcell
21 Epithelial PancreaticDuctal PancreaticDuctal PancreaticDuctal
41 Hematopoietic Tcell CD8Tcell ProlifCD8Tcell
19 Epithelial PancreaticDuctal PancreaticDuctal PancreaticDuctal
13 Epithelial PancreaticDuctal PancreaticDuctal PancreaticDuctal
18 Epithelial PancreaticAcinar PancreaticAcinar PancreaticAcinar
39 Fibroblast Fibroblast Fibroblast Fibroblast
43 Hematopoietic Tcell CD4Tcell RegTcell
47 Fibroblast PancStellate PancStellate PancStellate
15 Epithelial PancreaticDuctal PancreaticDuctal PancreaticDuctal
9 Epithelial PancreaticDuctal PancreaticDuctal PancreaticDuctal
53 Fibroblast PancStellate PancStellate PancStellate
0 Epithelial PancreaticDuctal PancreaticDuctal PancreaticDuctal
3 Fibroblast PancStellate PancStellate PancStellate
48 Endothelial VesselEndothelial VesselEndothelial VesselEndothelial
50 Neural Neural Neural Neural
44 Epithelial PancreaticAcinar PancreaticAcinar PancreaticAcinar
46 Fibroblast PancStellate PancStellate PancStellate
12 Epithelial PancreaticDuctal PancreaticDuctal PancreaticDuctal
14 Epithelial PancreaticDuctal PancreaticDuctal PancreaticDuctal
17 Endothelial VesselEndothelial VesselEndothelial VesselEndothelial
1 Endothelial VesselEndothelial VesselEndothelial VesselEndothelial
32 Epithelial PancreaticAcinar PancreaticAcinar PancreaticAcinar
52 Fibroblast PancStellate PancStellate PancStellate
37 Epithelial PancreaticDuctal PancreaticDuctal PancreaticDuctal
45 Epithelial PancreaticDuctal PancreaticDuctal PancreaticDuctal
26 Epithelial PancreaticDuctal PancreaticDuctal PancreaticDuctal
7 Epithelial PancreaticDuctal PancreaticDuctal PancreaticDuctal
4 Hematopoietic Myeloid Macrophage Macrophage
35 Endothelial VesselEndothelial VesselEndothelial VesselEndothelial
28 Epithelial PancreaticDuctal PancreaticDuctal PancreaticDuctal
23 Epithelial PancreaticDuctal PancreaticDuctal PancreaticDuctal
24 Epithelial PancreaticDuctal PancreaticDuctal PancreaticDuctal
29 Hematopoietic Blymphocyte Bcell ProlifBcell
51 Epithelial Enteroendocrine EpsilonPancreatic EpsilonPancreatic
8 Fibroblast PancStellate PancStellate PancStellate
5 Fibroblast Fibroblast Fibroblast Fibroblast
10 Hematopoietic Blymphocyte Bcell MemBcell
22 Fibroblast Fibroblast Fibroblast Fibroblast
27 Epithelial PancreaticDuctal PancreaticDuctal PancreaticDuctal
49 Epithelial PancreaticDuctal PancreaticDuctal PancreaticDuctal
In [36]:
adata.obs['celltype0']=bc.tl.sig.add_anno(adata,cnames,'celltype0',clusters)
adata.obs['celltype1']=bc.tl.sig.add_anno(adata,cnames,'celltype1',clusters)
adata.obs['celltype2']=bc.tl.sig.add_anno(adata,cnames,'celltype2',clusters)
adata.obs['celltype3']=bc.tl.sig.add_anno(adata,cnames,'celltype3',clusters)

sc.pl.umap(adata,color=['celltype0']) #,'celltype3'
In [37]:
sc.pl.umap(adata,color=['celltype1']) #,'celltype3'
In [38]:
sc.pl.umap(adata,color=['celltype2']) #,'celltype3'
In [39]:
sc.pl.umap(adata,color=['celltype3']) #,'celltype3'

Only short names were used in the signature naming convention in this case. One can easity tranform this to EFO terms if preferred, a conversion table comes with besca.

In [40]:
### transform these short forms to dblabel - EFO standard nomenclature
nomenclature=pd.read_csv(bescapath+'/besca/datasets/nomenclature/CellTypes_v1.tsv',sep='\t',header=0,skiprows=range(1, 2))
In [41]:
cnamesDBlabel=[]
for mycol in list(cnames.columns):
    cnamesDBlabel.append([list(nomenclature.loc[nomenclature['short_dblabel']==x,'dblabel'])[0] for x in list(cnames[mycol])])
cnamesDBlabel=pd.DataFrame(cnamesDBlabel).transpose()
cnamesDBlabel.columns=cnames.columns
cnamesDBlabel.index=cnames.index
In [42]:
cnamesDBlabel
Out[42]:
celltype0 celltype1 celltype2 celltype3
16 hematopoietic cell myeloid leukocyte macrophage macrophage
31 epithelial cell enteroendocrine cell type B pancreatic cell type B pancreatic cell
6 fibroblast fibroblast fibroblast fibroblast
33 hematopoietic cell lymphocyte of B lineage plasma cell plasma cell
25 hematopoietic cell T cell CD8-positive, alpha-beta T cell CD8-positive, alpha-beta cytotoxic T cell
38 endothelial cell blood vessel endothelial cell blood vessel endothelial cell blood vessel endothelial cell
42 endothelial cell blood vessel endothelial cell blood vessel endothelial cell blood vessel endothelial cell
30 fibroblast fibroblast fibroblast fibroblast
20 hematopoietic cell myeloid leukocyte myeloid dendritic cell CD1c-positive myeloid dendritic cell
2 endothelial cell blood vessel endothelial cell blood vessel endothelial cell blood vessel endothelial cell
36 epithelial cell pancreatic ductal cell pancreatic ductal cell pancreatic ductal cell
34 epithelial cell pancreatic ductal cell pancreatic ductal cell pancreatic ductal cell
40 fibroblast pancreatic stellate cell pancreatic stellate cell pancreatic stellate cell
11 hematopoietic cell T cell CD4-positive, alpha-beta T cell central memory CD4-positive, alpha-beta T cell
21 epithelial cell pancreatic ductal cell pancreatic ductal cell pancreatic ductal cell
41 hematopoietic cell T cell CD8-positive, alpha-beta T cell proliferating CD8-positive, alpha-beta T cell
19 epithelial cell pancreatic ductal cell pancreatic ductal cell pancreatic ductal cell
13 epithelial cell pancreatic ductal cell pancreatic ductal cell pancreatic ductal cell
18 epithelial cell pancreatic acinar cell pancreatic acinar cell pancreatic acinar cell
39 fibroblast fibroblast fibroblast fibroblast
43 hematopoietic cell T cell CD4-positive, alpha-beta T cell regulatory T cell
47 fibroblast pancreatic stellate cell pancreatic stellate cell pancreatic stellate cell
15 epithelial cell pancreatic ductal cell pancreatic ductal cell pancreatic ductal cell
9 epithelial cell pancreatic ductal cell pancreatic ductal cell pancreatic ductal cell
53 fibroblast pancreatic stellate cell pancreatic stellate cell pancreatic stellate cell
0 epithelial cell pancreatic ductal cell pancreatic ductal cell pancreatic ductal cell
3 fibroblast pancreatic stellate cell pancreatic stellate cell pancreatic stellate cell
48 endothelial cell blood vessel endothelial cell blood vessel endothelial cell blood vessel endothelial cell
50 neural cell neural cell neural cell neural cell
44 epithelial cell pancreatic acinar cell pancreatic acinar cell pancreatic acinar cell
46 fibroblast pancreatic stellate cell pancreatic stellate cell pancreatic stellate cell
12 epithelial cell pancreatic ductal cell pancreatic ductal cell pancreatic ductal cell
14 epithelial cell pancreatic ductal cell pancreatic ductal cell pancreatic ductal cell
17 endothelial cell blood vessel endothelial cell blood vessel endothelial cell blood vessel endothelial cell
1 endothelial cell blood vessel endothelial cell blood vessel endothelial cell blood vessel endothelial cell
32 epithelial cell pancreatic acinar cell pancreatic acinar cell pancreatic acinar cell
52 fibroblast pancreatic stellate cell pancreatic stellate cell pancreatic stellate cell
37 epithelial cell pancreatic ductal cell pancreatic ductal cell pancreatic ductal cell
45 epithelial cell pancreatic ductal cell pancreatic ductal cell pancreatic ductal cell
26 epithelial cell pancreatic ductal cell pancreatic ductal cell pancreatic ductal cell
7 epithelial cell pancreatic ductal cell pancreatic ductal cell pancreatic ductal cell
4 hematopoietic cell myeloid leukocyte macrophage macrophage
35 endothelial cell blood vessel endothelial cell blood vessel endothelial cell blood vessel endothelial cell
28 epithelial cell pancreatic ductal cell pancreatic ductal cell pancreatic ductal cell
23 epithelial cell pancreatic ductal cell pancreatic ductal cell pancreatic ductal cell
24 epithelial cell pancreatic ductal cell pancreatic ductal cell pancreatic ductal cell
29 hematopoietic cell lymphocyte of B lineage B cell proliferating B cell
51 epithelial cell enteroendocrine cell pancreatic epsilon cell pancreatic epsilon cell
8 fibroblast pancreatic stellate cell pancreatic stellate cell pancreatic stellate cell
5 fibroblast fibroblast fibroblast fibroblast
10 hematopoietic cell lymphocyte of B lineage B cell memory B cell
22 fibroblast fibroblast fibroblast fibroblast
27 epithelial cell pancreatic ductal cell pancreatic ductal cell pancreatic ductal cell
49 epithelial cell pancreatic ductal cell pancreatic ductal cell pancreatic ductal cell

Finally, one can add the new labels to adata.obs as annotation.

In [25]:
adata.obs['celltype0']=bc.tl.sig.add_anno(adata,cnamesDBlabel,'celltype0',clusters)
adata.obs['celltype1']=bc.tl.sig.add_anno(adata,cnamesDBlabel,'celltype1',clusters)
adata.obs['celltype2']=bc.tl.sig.add_anno(adata,cnamesDBlabel,'celltype2',clusters)
adata.obs['celltype3']=bc.tl.sig.add_anno(adata,cnamesDBlabel,'celltype3',clusters)
In [26]:
sc.pl.umap(adata,color=['celltype0']) #,'celltype3'
In [27]:
sc.pl.umap(adata,color=['celltype1']) #,'celltype3'
In [28]:
sc.pl.umap(adata,color=['celltype2']) #,'celltype3'
In [29]:
sc.pl.umap(adata,color=['celltype3']) #,'celltype3'
In [30]:
sc.pl.umap(adata, color= [ 'CONDITION','Patient'])
In [32]:
sc.pl.umap(adata,color=['CONDITION'], save='_TumorNormal.svg') 
In [31]:
adata.obs['dblabel']=adata.obs['celltype3']
In [32]:
### Save file in case happy with annotation
adata.write(results_file)

Export labelling

Chosen labels can also be exported as a new folder in labelings/

In [33]:
### Save labelling
adata = bc.st.additional_labeling(adata, 'celltype3', 'celltype3', 'Major cell types attributed based on HumanCD45p_scseqCMs8', 'schwalip', results_folder)
rank genes per label calculated using method wilcoxon.
mapping of cells to  celltype3 exported successfully to cell2labels.tsv
average.gct exported successfully to file
fract_pos.gct exported successfully to file
labelinfo.tsv successfully written out
./analyzed/StdWf1_PRJCA001063_CRC_besca2/labelings/celltype3/WilxRank.gct written out
./analyzed/StdWf1_PRJCA001063_CRC_besca2/labelings/celltype3/WilxRank.pvalues.gct written out
./analyzed/StdWf1_PRJCA001063_CRC_besca2/labelings/celltype3/WilxRank.logFC.gct written out
In [34]:
### Save labelling
adata = bc.st.additional_labeling(adata, 'dblabel', 'dblabel', 'Cell types attributed according to EFO nomenclature', 'schwalip', results_folder)
rank genes per label calculated using method wilcoxon.
mapping of cells to  dblabel exported successfully to cell2labels.tsv
average.gct exported successfully to file
fract_pos.gct exported successfully to file
labelinfo.tsv successfully written out
./analyzed/StdWf1_PRJCA001063_CRC_besca2/labelings/dblabel/WilxRank.gct written out
./analyzed/StdWf1_PRJCA001063_CRC_besca2/labelings/dblabel/WilxRank.pvalues.gct written out
./analyzed/StdWf1_PRJCA001063_CRC_besca2/labelings/dblabel/WilxRank.logFC.gct written out

Explore marker genes

In [36]:
### In case one needs to create a new level
panepi=adata.obs['Cell_type'].copy()
panepi[panepi.isin(['Acinar cell','Ductal cell type 1','Ductal cell type 2','Endocrine cell'])]='Acinar cell'
panepi[panepi.isin(['T cell','Macrophage cell','B cell'])]='T cell'
adata.obs['Cell_type2']=list(panepi)
In [37]:
### Perform DE cells of each celltype3 vs. all other cells
DEgenes=bc.tl.dge.get_de(adata,'Cell_type2',demethod='wilcoxon',topnr=5000, logfc=1,padj=0.05)
... storing 'Cell_type2' as categorical
In [38]:
sc.pl.umap(adata, color= ['Cell_type2'])
In [39]:
### Select only top 15 genes (in order of p-val) for 2 cell types and plot expression per cell type
tops=list(DEgenes['T cell']['Name'][0:15])+list(DEgenes['Acinar cell']['Name'][0:15])
sc.pl.dotplot(adata, var_names=tops,groupby='Cell_type', dot_max=0.5)
Out[39]:
GridSpec(2, 5, height_ratios=[0, 10.5], width_ratios=[10.5, 0, 0.2, 0.5, 0.25])
In [40]:
sc.pl.dotplot(adata, var_names=mymarkers['Epithelial'],groupby='Cell_type')
#sc.pl.dotplot(adata, var_names=['KRT7','DEFB1','FGFR2','SFRP5','BICC1','SLC12A2'],groupby='Cell_type')
Out[40]:
GridSpec(2, 5, height_ratios=[0, 10.5], width_ratios=[4.199999999999999, 0, 0.2, 0.5, 0.25])

Only healthy patients

In [35]:
normal=adata[adata.obs['CONDITION']=='N'].copy()
sc.pp.neighbors(normal)
sc.tl.leiden(normal)
sc.tl.umap(normal)
./.conda/envs/besca_test/lib/python3.7/site-packages/numba/core/typed_passes.py:314: NumbaPerformanceWarning:


The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.

File "../../../../../home/schwalip/.local/lib/python3.7/site-packages/umap/rp_tree.py", line 135:
@numba.njit(fastmath=True, nogil=True, parallel=True)
def euclidean_random_projection_split(data, indices, rng_state):
^


./.local/lib/python3.7/site-packages/umap/nndescent.py:92: NumbaPerformanceWarning:


The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.

File "../../../../../home/schwalip/.local/lib/python3.7/site-packages/umap/utils.py", line 409:
@numba.njit(parallel=True)
def build_candidates(current_graph, n_vertices, n_neighbors, max_candidates, rng_state):
^


./.conda/envs/besca_test/lib/python3.7/site-packages/numba/core/typed_passes.py:314: NumbaPerformanceWarning:


The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.

File "../../../../../home/schwalip/.local/lib/python3.7/site-packages/umap/nndescent.py", line 47:
    @numba.njit(parallel=True)
    def nn_descent(
    ^


./.local/lib/python3.7/site-packages/umap/spectral.py:229: UserWarning:

Embedding a total of 2 separate connected components using meta-embedding (experimental)

In [36]:
sc.pl.umap(normal,color=['Cell_type','celltype3'])
In [37]:
sc.pl.umap(normal,color=['leiden'],legend_loc='on data')
In [38]:
#sc.pl.umap(normal,color=['PPY', 'STMN2', 'PTP4A3', 'DPYSL3'])
sc.pl.umap(normal,color=mymarkers['EpsilonPancreatic'])
In [39]:
adata.write(results_file+'.healthyOnly.h5ad')